The Reliability and Validity of Short Form-12 Health Survey Version 2 for Chinese Older Adults.

Background
We assessed the information regarding the psychometric properties of the Short Form-12 Health Survey Version 2 (SF-12v2) among older adults in China.


Methods
A cross-sectional study was conducted on a stratified representative sample of older adults (≥60 years) residing in community and nursing home settings in 2017-18. Reliability was estimated using the internal consistency method. Validity was assessed using convergent and discriminant validity checks, factor analyses (including both exploratory and confirmatory factor analyses [EFA and CFA]), and "known groups" construct validity.


Results
The final sample comprised 1000 older adults (451 community-dwelling and 549 institutional). Cronbach's α was 0.81 for the Physical Component Summary (PCS) and 0.83 for the Mental Component Summary (MCS), showing satisfactory internal consistency for both. Most items were strongly correlated with their represented component (Spearman's correlation coefficient: 0.62-0.87), although the correlation of SF items with PCS was a bit stronger than that with MCS. A two-factor structure (physical and mental health) indicated by EFA jointly accounted for 68.50% of the variance and presented adequate goodness-of-fit indices (GFI=0.98, AGFI=0.92, RMSEA=0.08, 90% Cl RMSEA=0.06 to 0.11, NFI=0.98, and CFI=0.98) in CFA. Known-groups comparison showed that SF-12v2 summary scores did well in differentiating subgroups of older adults by age, marital status, and self-reported health problems (P≤0.05).


Conclusion
SF-12v2 is a reliable and valid health-related quality of life instrument for Chinese older adults that works equally well with older adults under institutional care and community-based home care models.

SF-12v2's brevity is an advantage in assessing older adults' HRQoL in large-scale studies, resulting in less burden for respondents and researchers (7). SF-12v2 is a reliable, valid measure in a variety of population groups (6,(8)(9)(10)(11)(12), but information regarding its psychometric properties among older adults in China is still lacking. Two recent studies in Hong Kong have demonstrated validity and reliability for the general population and adolescents there (6,13). However, no study has verified the instrument's usability with older adults in China proper. We used a large sample of Chinese older adults to confirm the psychometric performance (reliability and validity) of SF-12v2 in this subpopulation, so that it can be confidently applied among them to promote healthy and active aging.

Sample and data collection
A cross-sectional study was conducted from June 2017 to January 2018 among a sample (≥60 years) residing in community and nursing home settings in Guangzhou, the capital of Guangdong Province. According to the WHO and the United Nations (14), people over 60 years of age in de-veloping countries are defined as older adults. The elderly population of Guangzhou (17.3% in 2015, or 1,475,300 older adults) is slightly above the national average; they constituted the survey population (15). Due to economic and cultural similarities, our research results may be extended to other Chinese people in economically developed regions of China (e.g., eastern coastal cities). Dementia patients, psychosis patients, and older adults with cognitive or communicative impairments were excluded.
In community settings, this study used a threestage sampling method. In the first stage, six districts were selected randomly according to elderly population size to provide a representative sample. In the second, one to three communities were randomly selected by systematic sampling.
In the third stage, qualified participants were selected through simple random sampling in each community. In nursing home settings, a twostage sampling method was applied. First, two nursing homes large enough to provide a representative sample (nursing beds≥1000) were selected randomly. Next, eligible participants were randomly selected from the accommodation registration record in each nursing home. To each of community settings and nursing home settings, 800 questionnaires were distributed. One thousand and fourteen voluntary participants, out of 1600 initially approached (total response rate 63.4%; response rate for community settings 57.0%; response rate for nursing home settings 69.8%), agreed to be surveyed face to face by trained investigators. Informed consent was obtained from all participants.

Questionnaire and scoring
The General information was investigated in a selfdeveloped questionnaire, including aged care model, age, gender, education level, marital status, chronic diseases, etc. Self-report questions about chronic diseases and health problems were asked by trained investigators recruited from medical university and thus having good knowledge of chronic diseases. Participants were asked "Has a doctor ever told you that you had chronic diseases (for example, hypertension, diabetes, heart problem, osteoarthritis, eye problem, ear problem or others)?" Among the above questions, eye problems included age-related maculopathy, glaucoma, cataract, etc., while ear problems included otitis media, difficulty hearing, etc.

Quality control
The questionnaire was administered face to face; answers were checked onsite. Respondents were asked to correct or complete any double or missing answers. To ensure data quality, remaining invalid questionnaires, defined as those including missing data or double answers, were excluded from the analysis.

Statistical analysis
Descriptive analysis for continuous variables was performed using means and standard deviations, while categorical variables were reported using frequencies and percentages. Floor and ceiling effects of PCS-12 and MCS-12 were determined by percentages of sample participants with lowest and highest possible scores. For the total sample and subgroups under different aged care models, reliability was estimated using the internal consistency method; Cronbach's α coefficient equal to or greater than 0.70 was considered satisfactory (11). Validity was assessed using convergent and discriminant validity checks, factor analyses (exploratory factor analysis [EFA] and confirmatory factor analysis [CFA]), and "known groups" construct validity.
In terms of convergent validity, all hypothesized item-component correlations, corrected for over-lap, should be 0.40 or above (16). In terms of discriminant validity, hypothesized itemcomponent correlations should be significantly higher than the alternative item-component correlations (10). Spearman's correlation coefficient (ρ) was used to calculate the correlations. Exploratory factor analysis and confirmatory factor analysis were used to extract the factor structure of SF-12v2. For the total sample and the subgroups under different aged care models, EFA was performed using principal components analysis with varimax rotation. For legitimacy of the analysis, we confirmed that the KMO index was >0.70, and Bartlett's sphericity test provided a significant result. It was assumed that two principal components would be obtained with eigenvalues greater than 1. For the total sample and subgroups under different aged care models, CFA was performed using a two-factor model (PCS and MCS), which was the theoretical structure of SF-12v2 (10). Acceptable goodness-of-fit values included goodness-of-fit index (GFI), adjusted goodness-of-fit index (AGFI), normed fit index (NFI), and comparative fit index (CFI) more than 0.9 and root-mean-square error of approximation (RMSEA) values less than 0.08 (17). "Known groups" construct validity was assessed by testing hypothesized relationships between subgroups of the study sample and SF-12v2 component scores. It was expected that older participants, widowed or divorced persons, and those with one or more chronic diseases would report poorer health (7,10,18,19). The t-test was used for comparison. Statistical analysis was carried out using SPSS version 20.0 (Chicago, IL, USA) and AMOS version 22. The datasets used are available from the corresponding author on reasonable request.

Results
In total, 1600 older adults from community settings and nursing home settings were approached. Of these, 1014 agreed to take part in the investigation, for a response rate of 63.4%.
Among them, 1000 valid questionnaires were included in the analysis, while 14 invalid questionnaires in which missing or double answers were found were deleted. These 1000 participants (66.8% women) were aged from 60 to 108 years (Mean=78.34; SD=9.38); 45.1% came from community settings and 54.9% from nursing home settings. The overall education level of the older adults was low: 14.9% were illiterate (had less than elementary school education), 26.7% had elementary school education, 18.6% middle school, 18.2% high school, and 21.6% an associate degree or higher. The data were 100% complete in the valid questionnaires, benefiting from the face-to-face administration and the strict definition of invalid questionnaires.  Table 2, all correlations between items and hypothesized scale ranged from 0.87-1.00, significantly higher than the alternative item-scale correlations. What is more, correlation analysis showed that items in PF, RP, BP, GH, and SF subscales correlated more strongly with the PCS-12 score than MCS-12 score, with correlations ranging from 0.62-0.87, while items in VT, RE, and MH correlated higher with MCS-12, with correlations ranging from 0.63-0.82.  In the total sample group, the distribution of items on SF-12v2 allowed the use of EFA (KMO=0.90; Bartlett's P<0.001). The two-factor conceptual structure of SF-12v2 items in the Chinese elderly population was confirmed by the scree plot (Fig. 2) and principal components analysis (Table 3). Eigenvalues for the two factors (physical and mental health) that explained most of the variance observed were 6.69 and 1.51 respectively; this two-factor structure jointly accounted for 68.37% of variance. Principal components analysis, after varimax rotation, showed that items assessing PF, RP, BP, GH, and SF domains loaded higher on the physical component, whereas items assessing MH loaded higher on the mental component. VT and RE items loaded on both components. The twofactor conceptual structure was also confirmed in the subpopulations under different aged care models, which showed similar results to the total sample group in principal components analysis (Table 3).
In the total sample group, the results for confirmatory factor analysis of the two-factor model are shown in Fig. 3. The proposed two-factor model, which included PCS-12 and MCS-12, produced adequate goodness-of-fit indices (GFI=1.00, AGFI=0.97, RMSEA=0.05, 90% Cl RMSEA=0.03 to 0.08, NFI=1.00, and CFI=1.00) in confirmatory factor analysis. analysis. It demonstrated that the SF-12v2 worked equally well with subpopulations under these two aged care models. Significant differences were observed between PCS-12 summary scores by age group (Table 4): the oldest people (age≥80 years old) scored lower than the "younger older" adults (age=60-79 years old; P≤0.001). Furthermore, respondents who were widowed/divorced/single/separated demonstrated significantly lower scores than married older adults on both SF-12v2 component scores (P≤0.01). In addition, compared with respondents reporting specific health problems, those without such problems exhibited significantly higher mean PCS-12 and MCS-12 scores ( Table 4). As seen in Table 4, each specific health problem was significantly associated with reduction in at least one summary score, indicating that SF-12v2 is responsive to the presence of specific health problems among respondents.

Discussion
Our study reported the reliability (internal consistency) and validity (convergent validity, discriminant validity, factorial validity, known-groups validity) of the Chinese version of SF-12v2, a widely used generic HRQoL instrument, among the elderly population in Guangzhou. The results provided strong evidence that SF-12v2 is a reliable, valid instrument for measuring and monitoring HRQoL in this population. As PCS-12 and MCS-12 scores were calculated by norm-based scoring algorithms, our results constitute a reference for cross-cultural comparisons in the HRQoL domain. In addition, the data were 100% complete in valid questionnaires, as face-to-face questionnaire administration allowed data quality to be checked onsite; however, the findings might not hold with a self-administered questionnaire, an approach shown to lead to high incompletion (20). Internal consistency reliability was >0.7 for the two component summary domains and the eight subscales, indicating that SF-12v2 is reliable for older adults in Guangzhou coming from different aged care models. This is higher than the reliability for the general population of Hong Kong (6) and comparable with that for older adults in southern Sweden (17). For both PCS and MCS, no floor or ceiling effects were observed, indicating that these summary scores sensitively measured variation in older adults' health status. SF-12v2 had good convergent and discriminant validity. Most items were strongly correlated with their represented component: the correlation of PF, RP, BP, and GH items with PCS was stronger than with MCS, and vice versa for RE, VT, and MH items. However, the correlation of SF item with PCS was unexpectedly a bit stronger than with MCS. Similar results were found in a cross-sectional postal survey of 8500 older adults in southern Sweden (17). For older adults, physical health might be an important factor in whether to participate in social activities (with friends or family members, at home or out) (21). The self-report questions about chronic diseases and health problems were asked by trained investigators. The participants were asked "Has a doctor ever told you that you had chronic diseases (for example, hypertension, diabetes, heart problem, osteoarthritis, eye problem, ear problem or others)?" Among the above questions, eye problems included age-related maculopathy, glaucoma, cataract, etc., while ear problems included otitis media, difficulty hearing, etc Even those who used to actively participate in social activities or were still very interested in doing so often found it difficult owing to poor physical health. Therefore, social functioning may be more highly related to the physical health component than the mental health component in the elderly population. What is more, items of RP and GH also had moderate correlations with MCS, items of SF had a relatively high correlation with MCS, and items of VT had a relatively high correlation with PCS. The reason might be related to the socio-demographic characteristics of older adults in China: the participants were generally not well educated, and varied greatly in income (22). Further adjustment to the instrument may thus be needed to suit Chinese culture bet-ter. Furthermore, older adults' comprehension and memory ability generally deteriorate with age. Thus, their understanding of abstract concepts such as "vitality," "general health," and "social activity" might also vary. All the above factors might affect the subjective judgments of the participants. In agreement with the original SF-36, items of GH, SF, and VT subdomains correlated highly with both MCS and PCS (11,23). RP items correlated moderately with MCS, which may be explained by noting that among Chinese older adults, RP items are understood to some extent as indications of mental as well as physical health. Similarly, items of RP, GH, VT, and SF were reported to correlate with both PCS and MCS (correlations≥0.4) in the Greek general population (7). However, in that study, VT items correlated more highly with PCS and SF items with MCS, a bit different from this study. Principal components analysis with varimax rotation supported SF-12v2 with a two-factor structure, the original conceptual model of SF-12v2. All item-factor loadings were confirmed except for SF items, which loaded more highly on PCS than on MCS, and VT items, which tended to cross-load on both summary components. Similar to a general-population study in the city of Chengdu, two factors (physical and mental health) were extracted in exploratory factor analysis (24). What is more, a two-dimensional item structure was also found in southern Sweden among older adults with Parkinson's disease, though that study identified a three-dimensional item structure among general older adults and older adults with stroke (17). A Korean general population study also reported a three-factor structure in exploratory factor analyses (8). Moreover, here, confirmatory factor analysis indicated that the theoretical two-factor structure fitted the data from older adults in Guangzhou very well. Many studies show similar results in general populations (9,10) and people with specific diseases (11,12); what is more, a study among Chinese older adults in central Shanghai also found that a two-factor structure fit the data well (25). SF-12v2 summary scores differentiated subgroups of older adults through known-groups validity, by age, marital status, and self-reported health problems, showing evidence of construct validity. Consistent with previous studies (10), our findings also showed that older adults with chronic disease status had poorer SF-12v2 summary scores than those without chronic disease; similarly, older adults who reported hypertension, diabetes, heart problems, osteoarthritis, eye problems, or ear problems had significantly lower mean PCS-12 and MCS-12 scores than those without. These results are consistent with results of studies in Greece (7) and Italy (26). There were a few limitations to this study. First, given the cross-sectional design, we could not provide evidence for test-retest reliability or lon-gitudinal construct validity of SF-12v2. Second, participants were likely healthier than older adults in general, as they were able to self-report. Third, recall bias might be present, due to the self-report method. The generalizability of study results may be limited by these factors, and further studies on the psychometric properties of the Chinese SF-12v2 among older adults are needed. Despite these limitations, SF-12v2 demonstrated validity and reliability among an elderly population in Guangzhou, a typical big city in China.

Conclusion
SF-12v2 works equally well in older adults under an institutional care model and under a community-based home care model. The psychometric performance of SF-12v2 was satisfactory to indicate the health of these Chinese older adults, scientifically justifying the use of this health measurement tool with them. This will simplify the use of health indicators for older adults in clinical studies, especially large-scale ones. As the population ages further, healthcare expenditures will increase. Selecting suitable HRQoL measurement and monitoring tools for older adults will help manage their health in advance or predict major issues, promoting healthy, active aging.

Ethical considerations
Ethical issues (Including plagiarism, informed consent, misconduct, data fabrication and/or falsification, double publication and/or submission, redundancy, etc.) have been completely observed by the authors. cy Research and Evaluation" Key Laboratory (2015WSYS0010), and a grant from the Public Health Service System Construction Research Foundation of Guangzhou.